Support reverse page ordering in sort pushdown phase 1#19848
Support reverse page ordering in sort pushdown phase 1#19848GaneshPatil7517 wants to merge 7 commits intoapache:mainfrom
Conversation
- Add reverse_pages flag to ParquetSource and ParquetOpener - Wire reverse_pages through try_reverse_output() alongside reverse_row_groups - Extend try_reverse_output() to set both flags when optimizing descending sorts - Add comprehensive test coverage for reverse_pages functionality - Align with existing reverse_row_groups infrastructure pattern Phase 1 establishes the foundation for page-level reverse ordering. Actual page-level reversal implementation will be completed in Phase 2 when arrow-rs provides page-level API support. Tests: - All 27 reverse-related tests pass - Added 4 new tests for reverse_pages flag - Verified backward compatibility Closes apache#19486
There was a problem hiding this comment.
Pull request overview
This PR implements Phase 1 infrastructure for reverse page ordering in Parquet sort pushdown optimization. The change adds a reverse_pages flag that follows the same pattern as the existing reverse_row_groups flag, establishing the foundation for future page-level reversal implementation.
Changes:
- Added
reverse_pagesboolean field toParquetSourceandParquetOpenerstructs with corresponding getter/setter methods - Extended
try_reverse_output()to set bothreverse_row_groupsandreverse_pagesflags when optimizing descending sorts - Added display formatting support to show when
reverse_pagesis enabled
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.
| File | Description |
|---|---|
| datafusion/datasource-parquet/src/source.rs | Added reverse_pages field with methods, updated try_reverse_output() to set the flag, added display formatting, and included comprehensive unit tests |
| datafusion/datasource-parquet/src/opener.rs | Added reverse_pages field to ParquetOpener struct with #[expect(dead_code)] attribute and updated test builder |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
This attribute suppresses the dead code warning for Phase 1 implementation. The method will be used in Phase 2 when actual page reversal is implemented.
5001db0 to
c8010fc
Compare
All test snapshots now reflect that reverse_pages flag is set alongside reverse_row_groups when optimizing descending sorts.
All EXPLAIN plan outputs now reflect that reverse_pages flag is set alongside reverse_row_groups when optimizing descending sorts. Updated test files: - sort_pushdown.slt (18 instances) - topk.slt (1 instance) - create_external_table.slt (1 instance)
zhuqi-lucas
left a comment
There was a problem hiding this comment.
I don't think we can implement this until now, the arrow-rs and parquet both not support it.
Thank you for the feedback @zhuqi-lucas! You're absolutely right - that's exactly why this is Phase 1 infrastructure only. This PR does not attempt to implement actual page-level reversal. Instead, it:
Would you like me to add more explicit documentation in the code comments clarifying this Phase 1 vs Phase 2 split? I'm happy to update the PR if needed. |
Summary
Implements Phase 1 infrastructure for reverse page ordering in Parquet sort pushdown optimization, addressing issue #19486. This foundation establishes the flag infrastructure necessary for future page-level reversal implementation.
What's Changed
reverse_pagesflag toParquetSourcestruct with getter/setter methodsreverse_pagesfield toParquetOpenerstruct via builder patterntry_reverse_output()to set bothreverse_row_groupsandreverse_pagesflags when optimizing descending sortsreverse_pageswhen enabledArchitecture
This implementation follows the established pattern of
reverse_row_groups:try_reverse_output()ensures coordination with row group reversalTesting
reverse_pagesfunctionality:test_reverse_pages_default_value- Verifies default is falsetest_reverse_pages_with_setter- Verifies setter works correctlytest_reverse_pages_clone_preserves_value- Ensures cloning preserves statetest_reverse_pages_independent_of_reverse_row_groups- Confirms independent flag operationcargo fmt- properly formattedcargo clippy -D warnings- no warningsPhase 1 Design Rationale
This Phase 1 implementation establishes infrastructure for future page-level reversal. Actual page reversal implementation is deferred to Phase 2 because:
ParquetRecordBatchStreamBuildercurrently lacks public APIs for page-level reversalPhase 2 can implement actual page reversal once arrow-rs provides necessary page-level APIs or alternative approaches are available.
Files Modified
datafusion/datasource-parquet/src/source.rs- Added reverse_pages field and methodsdatafusion/datasource-parquet/src/opener.rs- Added reverse_pages field to builderRelated Issues
Fix #19486